Optimizing PGRs for in vitro shoot proliferation of pomegranate with bayesian-tuned ensemble stacking regression and NSGA-II: a comparative evaluation of machine learning models

Background The process of optimizing in vitro shoot proliferation is a complicated task, as it is influenced by interactions of many factors as well as genotype. This study investigated the role of various concentrations of plant growth regulators (zeatin and gibberellic acid) in the successful in vitro shoot proliferation of three Punica granatum cultivars (‘Faroogh’, ‘Atabaki’ and ‘Shirineshahvar’). Also, the utility of five Machine Learning (ML) algorithms—Support Vector Regression (SVR), Random Forest (RF), Extreme Gradient Boosting (XGB), Ensemble Stacking Regression (ESR) and Elastic Net Multivariate Linear Regression (ENMLR)—as modeling tools were evaluated on in vitro multiplication of pomegranate. A new automatic hyperparameter optimization method named Adaptive Tree Pazen Estimator (ATPE) was developed to tune the hyperparameters. The performance of the models was evaluated and compared using statistical indicators (MAE, RMSE, RRMSE, MAPE, R and R2), while a specific Global Performance Indicator (GPI) was introduced to rank the models based on a single parameter. Moreover, Non‑dominated Sorting Genetic Algorithm‑II (NSGA‑II) was employed to optimize the selected prediction model. Results The results demonstrated that the ESR algorithm exhibited higher predictive accuracy in comparison to other ML algorithms. The ESR model was subsequently introduced for optimization by NSGA‑II. ESR-NSGA‑II revealed that the highest proliferation rate (3.47, 3.84, and 3.22), shoot length (2.74, 3.32, and 1.86 cm), leave number (18.18, 19.76, and 18.77), and explant survival (84.21%, 85.49%, and 56.39%) could be achieved with a medium containing 0.750, 0.654, and 0.705 mg/L zeatin, and 0.50, 0.329, and 0.347 mg/L gibberellic acid in the ‘Atabaki’, ‘Faroogh’, and ‘Shirineshahvar’ cultivars, respectively. Conclusions This study demonstrates that the 'Shirineshahvar' cultivar exhibited lower shoot proliferation success compared to the other cultivars. The results indicated the good performance of ESR-NSGA-II in modeling and optimizing in vitro propagation. ESR-NSGA-II can be applied as an up-to-date and reliable computational tool for future studies in plant in vitro culture.


Background
Over the past decade, the pomegranate tree (Punica granatum L.) has attained significant attention as an economically super fruit cultivated throughout the world, particularly in the arid and semiarid regions.This is due to its high medicinal effects, rich content of bioactive compounds such as antioxidant polyphenol, and numerous health advantages [1,2].Traditional methods of propagating pomegranates include sexual propagation through seeds and vegetative methods.However, both conventional propagation methods may face several limitations that cause pomegranate propagation to be difficult.Vegetative methods are time-consuming, dependent on seasonal production, and require intensive labor.Moreover, a large number of plants derived from cuttings often fail to survive [3].On the other hand, sexual methods are challenging due to the high heterozygosis and a long juvenile period in plants.In addition, seedlings propagated by mentioned methods are strongly affected by pest infestation and diseases [4].So, to achieve largescale pomegranate cultivation, in vitro cell and organ culture techniques have been developed.Plant tissue culture methods offer a promising approach for the rapid production of true-to-type pomegranate plants and the biotechnological exploitation of pomegranate and other plant species with valuable properties [5].Previous studies have attempted to apply in vitro culture techniques to propagate different cultivars of pomegranate [6,7].However, the findings have clearly emphasized that pomegranate micropropagation is moderately difficult and can vary depending on the cultivar, probably due to genetic variations among them [6,8].Nevertheless, the successful propagation of economically important woody plant species like pomegranate still presents challenges, due to the emergence of some problems during the proliferation stage including defoliation of explants, shoot tip necrosis, callusing, and hyperhydricity.These plant physiological disorders arise from factors such as undesirable medium composition, unsuitable type and concentration of plant growth regulators (PGRs), microbial contamination, phenolic browning caused by phenol secretion, ethylene accumulation, and tissue recalcitrance to proliferation (Fig. 1) [8][9][10].
The successful in vitro propagation of fruit trees is an intricate process that is influenced by numerous factors, including culture conditions, plant materials, and the composition of culture media, particularly PGRs [11].Extensive research has emphasized the crucial role of PGRs, such as cytokinins and auxins, and their different combinations with gibberellic acid (GA 3 ) in promoting shoot regeneration in different pomegranate cultivars [7].However, certain PGRs have shown varying levels of effectiveness in promoting proliferation.For example, 6-γ,γ-dimethylallylaminopurina (2-iP) has been reported to have lower proliferative efficiency, while others like 6-Benzylaminopurine (BAP), a commonly used cytokinin in tissue culture, can produce short and thin shoots, sometimes accompanied by excessive callus proliferation.

Fig. 1 A schematic view of different factors that influence physiological disorders of in vitro plants
Among the cytokinins, zeatin (ZT), a natural cytokinin, has been found to play a vital role in stimulating the maximum axillary buds and is applied at various concentrations either alone or in combination with other growth regulators.ZT is considered desirable for its stability in nutrient media, as it does not easily degrade or break down, thus providing sustained benefits for rapid and high rates of proliferation in most plant explants [12,13].Although different growth regulators, including BAP, kinetin, thidiazuron (TDZ), GA 3 , and IBA, have been used in various combinations with or without ZT to promote the stimulation of axillary buds, GA 3 is particularly known for inducing rapid shoot elongation, which is beneficial for subsequent rooting.Considering the high cost of ZT, researchers are actively exploring the combined use of ZT with other cytokinins while maintaining the proliferative potential of shoot cultures [14].However, it is important not to overlook the role of ZT in ensuring a good rate of proliferation [12].Nonetheless, it is crucial to acknowledge that the responses of different pomegranate cultivars to in vitro propagation are significantly vary depending on the interacting factors during the in vitro process, even in closely related species [15].Therefore, to achieve optimal results, optimizing of specific in vitro culture condition is necessary for each cultivar.
In vitro micropropagation is a multifactorial and complex biological process influenced by genotype/cultivar and various interacting factors that are crucial for optimizing this process.Traditional statistical techniques encounter with significant challenges in deciphering the large datasets of biological interactions, especially when datasets are nonlinear, complex, noisy, and ambiguous in nature, as observed in in vitro culture processes [16].To overcome these challenges, advanced computer-based technologies such as Machine Learning (ML) tools have emerged as capable solutions for analyzing and predicting complex and multivariate datasets with high accuracy.ML approaches offer the advantage of autonomous learning and data transformation into useful information without being humanly programmed [17].Recent studies have highlighted the superior predictive performance of MLs over traditional statistics in various in vitro culture systems, including optimizing culture conditions for shoot proliferation and rooting [10,18,19], androgenesis [20], seed germination [21], somatic embryogenesis [22], gene transformation [23], and enhancing of the secondary metabolite biosynthesis [24].
Among the various algorithm-based ML tools, ensemble learning methods have gained significant attention due to their simplicity and their ability to create powerful and robust predictions.These methods can be broadly categorized into bagging, boosting, and stacking/ blending.Notably, three prominent ensemble learning methods are Extreme Gradient Boosting (XGB), which utilizes the boosting concept, Random Forest (RF), based on bagging concept, and Ensemble Stacking Regression (ESR), based on stacking concept [25].Support Vector Machine (SVM) is a robust ML method that has been widely recognized for its remarkable accuracy in plant in vitro micropropagation, as evidenced by the findings of previous studies [19,26].One notable advantage of SVM is its ability to effectively handle high-dimensional data without encountering difficulties.Researchers have explored the potential of SVM to address the challenges by utilizing a small training dataset, further highlighting the versatility and effectiveness of SVM in providing accurate and reliable predictions even with limited training data [27].The Elastic Net Multivariate Linear Regression (ENMLR) was introduced by Zou and Hastie [28] as a robust approach for analyzing high-dimensional datasets.It was designed to overcome the limitations of the LASSO method.By incorporating regression techniques, ENMLR effectively regularizes and selects important predictor variables, thereby improving prediction accuracy of sparse modeling.This method has demonstrated its value in addressing the challenges associated with multicollinearity among predictor variables [29].Selecting the most appropriate ML method depends on the association between input and output variables, as well as the optimization of hyperparameters [19].In addition, the combination of ML techniques with evolutionary optimization algorithms confers significant advantages in predicting the critical factors that influence plant growth parameters in in vitro culture systems.One powerful algorithm in this regard is the non-dominated sorting genetic algorithm-II (NSGA-II), which is widely recognized as a search algorithm for optimizing multiobjective problems.NSGA-II enables efficient solving and prediction of complex processes while providing a simplified interpretation of results, simultaneously [30].In previous studies, the combining approach of ML with NSGA-II (ML-NSGA-II) has been acknowledged as a robust modeling technique for complex datasets, such as in optimizing the protocol of in vitro tissue culture on micropropagation phases [21,31,32] and in various plant science fields [30,33].
Based on our current knowledge, the application of ML algorithms as a novel strategy for modeling and predicting the in vitro shoot proliferation of pomegranate plants remains largely unexplored.The overall objective of this study is (i) to evaluate the effects of ZT at different concentrations and in combination with GA 3 on optimizing the tissue culture protocol of three commercially significant cultivars, namely 'Faroogh' , ' Atabaki' and 'Shirineshahvar'; (ii) to compare the potential robustness of the most commonly used ML algorithms, including SVR, RF, XGB, ESR, and ENMLR, in terms of their ability to model and optimize of the in vitro shoot proliferation process of pomegranate cultivars; and (iii) to employ the NSGA-II in order to predict the most effective level of PGRs for enhancing the proliferation of pomegranate.To our knowledge, this study is the first application of ML models for optimizing pomegranate tissue culture media.
In addition, despite the potential advantages of ESR and ENMLR, no study has been conducted on applying these procedures in plant science.

Plant material and explant preparation
The experiments were conducted using single nodal explants from three different pomegranate cultivars: 'Faroogh' , ' Atabaki' and 'Shirineshahvar' .These explants were obtained from pomegranate plants grown in a greenhouse of College of Agriculture, Shiraz University, Iran.Explants were pre-sterilized using a liquid soap solution and rinsed several times with tap water.Subsequently, the explants were subjected to surface sterilization by immersing them in 70% aqueous ethanol for 30 s, followed by treatment with 5% sodium hypochlorite for 10 min.Afterward, the explants were washed three times with sterilized distilled water under a laminar airflow chamber.Following the sterilization process, the stem explants were cut into 2-3 cm segments with lateral buds (Fig. 2a).

In vitro culture establishment
A preliminary test was carried out using different combinations of culture media: MS (Murashige and Skoog) [34], VS (Van der Salm) [35], WPM (woody plant medium) [36], half-strength MS, and modified MS (mMS), PGRs (BAP and NAA), phenol-controlling compounds (polyvinylpyrolidon, ascorbic acid, and activated charcoal), and silver nitrate (AgNO 3 ) as ethylene inhibitor.The main experiment was set up based on the pre-test results, which indicated that the mMS medium supplemented with activated charcoal and AgNO 3 in combination with either BAP or NAA was the best treatment for stimulating new shoot regeneration.In this experiment, the explants (2-3 cm stem segments with lateral buds) were immediately cultured in the capped glass containers containing 25 mL of mMS as a basal medium supplemented with 1 mg/L BAP, 0.5 mg/L NAA, 250 mg/L activated charcoal, 4.5 mg/L AgNO 3 , 0.7% agar, and 3% sucrose.To obtain the best hormonal composition at the protocol of pomegranate proliferation, the effects of different concentrations of GA 3 (0, 0.1, 0.25, and 0.5 mg/L) and ZT Fig. 2 In vitro propagation of pomegranate cultivar 'Faroogh' .a Single-node explants, b shoot proliferation in mMS medium supplemented with 0.750 mg/L zeatin and 0.500 mg/L gibberellic acid, c shoot proliferation in control medium, and (d) shoots propagated in mMS medium supplemented with 0.750 mg/L zeatin and 0.500 mg/L gibberellic acid (0, 0.25, 0.5, and 0.75 mg/L) on shoot proliferation were evaluated.Prior to autoclaving at 121 ℃ for 15 min, the pH of the medium was adjusted to 5.7-5.8.To mitigate tissue culture browning, the cultures were incubated in darkness for 7 days in a growth chamber at a temperature of 25 ± 2 ℃, and then transferred to a 16-h photoperiod with a light intensity of 80 µmol m −2 s −1 and an 8-h dark period.After three subcultures on the same culture medium, various morphological responses of the plants were measured for each cultivar; including the proliferation rate (PR; number of new shoots per explant), shoot length (SL; length of new regenerated shoots per explant in cm), leave number (LN; the number of leaves per explant), and explant survival (ES; the survival rate of explants in percent) (Fig. 3a).

Experimental design and data analysis
The proliferation experiment was carried out using a Completely Randomized Design (CRD) with a factorial arrangement.Each set of treatments consisted of 20 replicates, and subcultures were conducted over a threeweek period.The variances analysis was performed using statistical analysis software (version 9.4; SAS Institute, Cary, NC).

Description of ML models and optimization algorithm Model development
In this study, we employed a range of ML algorithms to build computational models using the datasets as training and testing data.Specifically, we selected most widely used ML algorithms such as SVR, RF, XGB, ENMLR, and ESR to analyze the effect of the independent variables on in vitro pomegranate plant growth responses.These five ML algorithms were applied to different pomegranate cultivars ('Faroogh' , ' Atabaki' , and 'Shirineshahvar'), with two independent variables consisting of various concentrations of GA 3 and ZT as inputs, and four plant growth responses (PR, SL, LN, and ES) considered as outputs.Prior to applying ML modeling, data scaling was employed to standardize the training set for each cultivar.The features are transformed into a mean of zero and a variance of one by standardizing the data using the Eq. 1.Additionally, Principal Component Analysis (PCA) was used to identify any outlier data; however, no outlier data was found in analysis.To train and test all five models, the experimental data (960 data points) were randomly divided into 80% and 20% for training and testing sets, respectively.
(1) X std = X o − µ σ where X std is standardized value, X o is original value, µ and σ are mean and standard deviation, respectively.

Hyper parameter optimization in ML models
In ML, the optimization and tuning of hyperparameters in advance play a crucial role in training ML models [37].These hyperparameters have a significant impact on prediction accuracy and overall performance.Various strategies exist for hyperparameter optimization, including babysitting, grid search, random search, and bayesian optimization [38].Among these strategies, Bayesian optimization is widely recognized for its generalizability across different test sets and its ability to achieve optimal hyperparameters with fewer iterations.In this study, a novel automatic tuning hyperparameter algorithm called Adaptive Three-structured Parzen Estimator (ATPE) was utilized in Bayesian optimization.This algorithm aimed to adjust the initial hyperparameters of five ML models to achieve optimized performance.It has not yet been applied to the optimization of in vitro PGRs.To improve the generalization performance of these models and avoid overfitting and underfitting, the study combined the ATPE method with K-fold cross-validation (K = 10).By employing the K-fold cross-validation method, all data points were involved in the training phase.The process is illustrated in Fig. 3b.The ML's hyperparameters and their search space are shown in Table 1.The investigation was conducted with K values ranging from 1 to 10 for K-fold cross-validation.Each K value represented the ATPE algorithm for optimal ML model selection and hyperparameter tuning.One fold was randomly selected as the validation set, while the remaining folds were used to train the model.By employing the K-fold cross-validation method, all data points were involved in the training process.

Support vector regression (SVR)
SVM is a supervised ML method that developed by Vapnik [39].Initially developed for classification problems (Support Vector Classifier or SVC), SVM was later extended to handle regression problems (SVR) [40].The fundamental concept behind SVR involves the use of a kernel function to map the original input data into a feature space.The SVM model estimates regression by utilizing a series of kernel functions to convert the original input data from its lower-dimensional representation to a higher-dimensional feature space.Unlike Artificial Neural Network (ANN) models, which often encounter multiple local minima, SVM provides a unique solution results that are at the global optimum.The approximated function within the SVR algorithm can be expressed as follows: where f (x) represents the estimated output value, ω denotes weight for the i th sample point, where 1  2 ω 2 represents the regularization term, while ɛ (epsilon) represents the insensitive tube.The approximated function in Eq. ( 2) can be explicitly expressed by incorporating Lagrange multipliers and leveraging the optimality constraints.By introducing the Lagrange multipliers (a i ) , the function is given by: where K (x i , x T i ) represents the kernel function.The Radial Basis Function (RBF) non-linear kernel function plays a crucial role in mapping of input vectors nonlinearly into a high-dimensional feature space.In this study, the RBF was utilized due to its superior performance in estimating the H estimations compared to other kernel functions.

Random forest (RF)
RF introduced for classification or regression prediction algorithm introduced by Breiman [41].It solves the performance limitations of decision trees and exhibits favorable characteristics such as robustness to noise and outliers, scalability, and parallelism in highdimensional data classification tasks.RF overcomes the "dimensionality disaster" often encountered in big data scenarios that often other models fail to perform effectively.Additionally, RF demonstrates comparable error rates to other methods across various learning tasks (2) and exhibits a reduced tendency to overfitting.Notably, RF is a well-known bagging algorithm that excels in regression problems [38].RF algorithm combines decision tree-based techniques with ensemble methods, effectively leveraging their synergistic benefits, making it a suitable choice as one of the foundational models in the ensemble model employed in this study.The formula of RF is as follows: (7 where x i refers to the value of the sample proportion, D(θ k ) denotes a different bootstrapped sample, and K is tree number ( T D(θ K ) ).

eXtreme Gradient Boosting (XGB)
XGB is an advanced supervised learning algorithm proposed by Chen and Guestrin [42].This method is based on the Gradient-Boosted Decision Tree (GBDT) approach.XGB aims to create a "strong" learner by combining predictions from a collection of "weak" learners using additive training strategies.This algorithm incorporates a second-order Taylor expansion of the loss function and a regular term, which effectively mitigates , respectively and x i represents the input variable.
In order to prevent the problem of overfitting without sacrificing the computational speed of the model, XGB employs an analytical expression to evaluate the "goodness" of the model in relation to the original function.This analytical formula, denoted as Eq. ( 2), is created by XGB to provide an estimate of the model's "goodness" while also reducing the computational speed associated with mathematical computations.
where l is the loss function, n indicates the observation number used, and σ denotes the regularization term as represented in Eq. (3).
where ω denote the vector of scores associated with leaves, represents the regularization parameter, and γ indicates the minimum loss required for further partitioning of a leaf node.

Elastic net multivariate linear regression (ENMLR)
ENMLR is a regression technique that combines two effective shrinkage regression methods: Ridge regression (L2 penalty) and LASSO regression (L1 penalty).Ridge regression is employed to address high-multicollinearity problems, while LASSO regression focuses on feature selection in regression coefficients.The elastic net estimator in ENMLR benefits from ridge regularization, which allows for better handling of correlations between predictors compared to LASSO regression.Simultaneously, the L1 regularization in elastic net promotes sparsity, facilitating the identification of essential features.However, similar to LASSO regression, the bias issue is (8) still present in ENMLR.The elastic net estimator minimizes the following expression: where β is the regression coefficients, β j is the regres- sion coefficient of the j th predictor variable, 1 and 2 are the tuning parameters coming from Lasso and Ridge, respectively and positive numeric values ( 1 , 2 > 0).λ is a penalty parameter and has the effect of a compression variable, and its numerical value indicates the severity of punishment.

Ensemble stacking regression (ESR)
The stacking regressor, initially introduced by Wolpert [43], is an effective ensemble learning technique that combines multiple regression models to improve prediction accuracy.In this approach, a meta-regressor is trained to aggregate the predictions of the base regressors, thereby leveraging the collective knowledge of the individual models Li et al. [44].Different techniques, such as stacking, weighted averaging, and direct averaging, can be employed to create ensemble regressors by integrating the predictions of the base models [45].The choice of the specific technique depends on finding an optimal balance for combining the predictions, and the meta-regressor can be any type of regression models [46].To implement stacking regression, the new meta feature sets generated by each base regressor are merged to form the meta training set, and the new target sets produced by each base regressor are combined to create the meta testing set.The final predictions are then generated by the meta-regressor, which is trained using the new meta training set Wu et al. [25].The stacking regression methodology has gained popularity in various domains, including molecular quantum characteristics [44], daily reference evapotranspiration estimation [25], genome prediction [47], and stock portfolio prediction [48].In this particular study, XGB, SVR, and ENMLR models were utilized as the base regressors, while RF was employed as the meta-regressor.

Performance evaluation
In order to evaluate and compare the accuracy and performance of the developed ML algorithms in predicting the proliferation of pomegranate, five popular statistical quantitative indicators, namely the correlation coefficient (R), Coefficient of Determination (R 2 ), Root Mean Square Error (RMSE), Relative Root Mean Squared Error (RRMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), were (11) utilized.These quantitative indicators can be found in Table 2.

Global performance indicator (GPI)
In order to enhance the accuracy and reliability of statistical analysis and to mitigate any potential discrepancies, we employed the GPI method.Despotovic et al. [49] were the pioneers in introducing GPI as a novel aspect.GPI is a remarkable technique that combines the effects of multiple statistical indicators.During the process, all statistical indicators are scaled to a range between 0 and 1.Subsequently, the appropriate median value of all models is subtracted from each scaled value of a statistical indicator.These differences are then aggregated using appropriate weighting factors (a weight of -1 for R and R 2 and a weight of 1 for all other statistical indicators).The model with higher GPI values is considered the best.The following equation represents the GPI model: ( where GPI i represents global performance indicator for model i , M S j is median of scaled values of indicator j , I S ij is the scaled value of indicator j for model i , α j equals -1 for both R and R 2 and 1 for other performance criteria.

Optimization of ML model via non-dominated sorting genetic algorithm-II (NSGA-II)
The best ML algorithm as the fitness function was introduced to the Non-dominating Sorting Genetic Algorithm (NSGA-II) as optimization algorithm in order to find the optimal combination of inputs (GA 3 and ZT) for achieving maximal growth responses in three cultivars (Fig. 3c).Based on natural selection, this study employed several parameters to ensure the effectiveness of the NSGA-II optimization process.The first step in the NSGA-II process involved the creation of an initial population, where all the chromosomes were constructed.Then the tournament selection method was adopted to select an elite population for crossover.A binary crossover function, a well-known crossover technique, was considered to generate the next generation of chromosomes.To introduce diversity into the population and prevent convergence to local optima, a mutation operator was applied.It introduced random variations into the chromosomes, reducing the possibility of having similar chromosomes

Table 2 Description of statistical indicators for the constructed models evaluation
Where n is total measurement, O i and P i , are observed and predicted values, O and P stand for mean of observed and predicted values, respectively

Performance criteria
Formula Description R is a statistical measure that quantifies the degree of correlation between observed and predicted values.The model's predictability improves as it approaches 1 Coefficient of determination ( 7) R 2 represents the proportion of the variance in the observed data that is explained by the regression model.As R within the population [50].The non-dominated sorting concept was utilized to derive non-dominated solutions, with each non-dominated front assigned a rank or level date.The non-dominated front with the highest rank is removed, and the remaining solutions were used to generate the parent population for the next generation.Crowding distance was employed to estimate the objective function, and solutions categorized by crowding distance in descending order based on the lowest density of solutions with less priority.In order to achieve an improved fitness function during the optimization process, the optimal values for crucial operators such as the crossover rate, maximum generation, initial population, and mutation rate were regulated through trial and error.In the current study, the crossover rate was set at 90% with a distribution index of 15, the maximum generation was set to 200, the initial population size was 100, and a distribution index of 20 was used for the mutation operator which was real-valued polynomial mutation (real_pm) (Fig. 3c).

The effect of PGRs on in vitro shoot proliferation and development of pomegranate
According to data analysis using factorial ANOVA, the growth responses of pomegranate, including LN, PR, ES, and SL were found to be significantly influenced by different concentrations and combinations of PGRs (GA 3 and ZT), as well as the cultivar type.The detailed results can be found in Table 3.
The addition of ZT to the growth medium, particularly at a concentration of 0.75 mg/L, resulted in improved shoot regeneration favorable vegetative growth characteristics per explant when compared to the control medium.Based on the results of Table 3, although the positive changes in the growth parameters were primarily attributed to increasing the concentrations of PGRs and the interaction between them, the combination of the highest concentration of ZT and GA 3 treatment was the most effective treatment in promoting overall growth response.Specifically, when the media was augmented with 0.50 mg/L GA 3 and 0.75 mg/L ZT the average growth response was significantly enhanced (Table 3).It is important to note that the observed changes in the growth parameters were different based on the cultivar type.Among the three cultivars studied, the 'Faroogh' cultivar exhibited the maximum values of LN (23.62), and PR (4).Similarly, the ' Atabaki' cultivar showed the highest growth responses in SL (6.75 cm) when treated with 0.50 mg/L GA 3 and 0.75 mg/L ZT.Regarding ES, both 'Faroogh' and ' Atabaki' cultivars demonstrated a maximum value of ES which was 100% when exposed to three treatments involving the interaction of 0.25, 0.50, 0.75 mg/L ZT with 0.50 mg/L GA 3 .In contrast, the 'Shirineshahvar' cultivar exhibited lower ES rates than other cultivars.For this particular cultivar, the same treatment interaction as mentioned earlier led to the highest values of LN (18.94),PR (3.56), ES (61.87%), and SL (1.95 cm).Generally, the highest and lowest overall growth responses were achieved in the 'Faroogh' and 'Shirineshahvar' , respectively (Table 3).

Comparison of ML performance
In the present study, we utilized the advantages of five ML algorithms namely RF, XGB, SVR, ESR, and ENMLR to build the mathematical models.The scatter plots in Figs. 5, 6 and 7 illustrate the prediction results of these models, while the corresponding prediction evaluation indexes are shown in Tables 4, 5, and 6.Violin plots of the performance metrics are presented in Fig. 4. When comparing the ENMLR to other ML algorithms for all parameters (outputs), both the training and test subset R-values, which measure the correlation between observed (experimental) and predicted values of ML algorithms, were lower.This indicates that all five ML models had a good performance and predictability.However, the ESR with higher R and R 2 and smaller RRMSE, RMSE, MAE, and MAPE values in both training and testing sets was the best algorithm in comparison to four other models for all growth parameters (Tables 4, 5 and 6).In this regard, the results derived by comparing the statistical indicators of the different models on the measured growth parameters revealed that the values of the ESR was very close to the other ML algorithms in all three cultivars.Moreover, the impact of statistical quantitative indicators was not clearly distinguishable and different statistical indicator values are in favor for different models; therefore, to address this vagueness, the GPI for the test dataset of overall ML logarithms was calculated and presented in Table 7.The GPI estimation ranked the ESR model as the top performer among all other models.Calculated GPI revealed the order of ESR vs. XGB, RF, SVR, and ENMLR models were: 1.829 vs. − 1.674, 0.647, 0, − 4.171, for LN of ' Atabaki' cultivar; 1.312 vs. − 2.562, 0, 0.525, and − 4.688 for LN of 'Faroogh' cultivar; 0.089, − 3.040, 0.032, 0.004, and − 5.911 for LN of 'Shirineshahvar'  7).
Additionally, the regression lines demonstrated the good fit correlation between the observed and predicted data for all growth parameters during both the training and testing phases of the ML models (Figs. 5, 6, and 7).

Optimization process via non-dominated sorting genetic algorithm-II
The NSGA-II algorithm, as multi-objective evolutionary optimization, was linked to the ESR model which was identified as the most accurate algorithm.ESR-NSGA-II algorithm has successfully determined the optimal values for four growth parameters (LN, PR, ES, and SL) in response to different concentrations of PGRs.The results of the ESR-NSGA-II algorithm are summarized in Table 8.In the ' Atabaki' cultivar, the ESR-NSGA-II algorithm identified that the culture medium supplemented with 0.750 mg/L ZT along with, 0.50 mg/L GA 3 , resulted in the most significant improvements in growth parameters.Specifically, this combination treatment displayed the best outputs with 18.18 LN, 3.47 PR, 84.21% ES, and 2.74 cm SL.For the 'Faroogh' cultivar, the optimization algorithm determined that the culture medium supplemented with 0.654 mg/L ZT along with, 0.329 mg/L GA 3 were the optimal input variables to achieve the best outputs with 19.76 LN, 3.84 PR, 85.49% ES, and 3.32 cm SL.In the 'Shirineshahvar' cultivar, the culture medium supplemented with 0.705 mg/L ZT, combined with 0.347 mg/L GA 3 , were the significant input variables to achieve the best outputs with 18.77 LN, 3.22 PR, 56.39% ES, and 1.86 cm SL (Table 8).

Discussion
The success of in vitro plant tissue culture strongly depends on several external and internal factors, including environmental conditions, PGRs types, culture medium composition, and gelling agents, and genotype [18].The application of PGRs, particularly cytokinin and auxin, are commonly used to optimize protocols for in vitro tissue culture and shoot regeneration [17,54,55].Auxin increases the susceptibility of apical meristem cells that are less mitotically active cells to cytokinin [56], while cytokinin promotes cell proliferation, including cell division and shoot elongation [10].In the case of pomegranate, which is a recalcitrant woody plant for in vitro culture, the optimization of type and concentration of PGRs, as well as their interactions, play a crucial role [8,[57][58][59].
In previous studies to efficiently multiply various pomegranate species, it has been reported that integrating BAP with or without NAA at specific concentrations ranging from 0.4 to 2 mg/L for BAP and 0.5 to 1 mg/L for NAA, has proven effective [57].However, it is important to note that the results of these studies are often specific to particular cultivars and cannot be universally applied.The optimization of PGR concentrations is necessary due to genetic factors and complexities associated with the oxidation of phenols in explants and culture media, which can lead to tissue death.Furthermore, pomegranate tissue culture protocols are highly dependent on the cultivar and may differ due to variations in uptake rates, translocation rates, or metabolic processes within the meristematic regions of the plant.Additionally, cytokinin metabolism plays a crucial role, as cytokinins may undergo degradation or conjugation with sugars or amino acids, leading to the formation of biologically inert compounds, as reported by Desai et al. [60].
Although ZT has been recognized as highly effective in promoting shoot proliferation in various plant species [61][62][63], its use in pomegranate tissue culture has limited compared to other cytokinins.Similarly, the use of GA 3 in shoot proliferation, particularly in recalcitrant woody trees like pomegranate, has received limited attention.However, several studies have demonstrated that the interaction between cytokinins with GA 3 can improve the development of shoot/root apical meristems [8,64,65].This study introduces a new shoot proliferation protocol for pomegranate cultivars, which utilizes a combination of ZT and GA 3 .The results demonstrate the    higher recalcitrant to shoot proliferation compared to the other cultivars.This could be attributed to variations in the concentration of endogenous phytohormones within the plants and their interaction with the applied exogenous PGRs in the culture of explants [67].
Developing and optimizing tissue culture protocols is a complex task that poses significant challenges to the field as a whole.The multifactorial nature of in vitro culture processes makes them difficult to understand and interpret using traditional statistical approaches such as ANOVA, t-tests, correlation, and regression, specifically when the variables investigated are nonlinear, noisy, complex, and vague in nature [68].The knowledge derived from MLs, as complex mathematical tools, offer promise in understanding and interpreting the intricate, nonlinear relationships within datasets.ML models have demonstrated superior predictive power over traditional statistical methodologies when analyzing unpredictable variables and big dataset.Despite the advantages of ML, uncertainty in ML outcomes remains a major constraint in its application [69].Uncertainty in ML studies arises from three primary sources: data quality, the sample of data collected from the domain, and model fitting [70].To avoid uncertainties, researchers have recommended the application of different ML algorithms [69,70].In this study, five ML approaches (XGB, RF, SVR, ESR, and ENMLR) were employed for modeling the effects of various parameters (PGRs) on in vitro shoot proliferation of pomegranate.While similar performance was observed across the ML models in predicting pomegranate shoot multiplication, the results of the GPI analysis indicated that the ESR model stood out as the best performer.It exhibited robustness and superior predictive accuracy in both the training and testing subsets.It is worth noting Fig. 4 The violin plots of the performance metrics of analyzed models on the observed value vs. the predicted values on in vitro pomegranate growth parameters including: A leave number, B proliferation, C explant survival, D shoot length that there is a lack of specific investigations regarding the use of the ESR algorithm in the field of plant tissue culture.Nonetheless, numerous studies in other scientific disciplines have demonstrated the robust performance of the ESR model in various prediction tasks [71,72].In recent research has shown that integrating optimization algorithms, particularly NSGA-II, with ML models can provide valuable insights and effective utilization of the models.The application of NSGA-II in conjunction with ML enables the answering of "How to get" questions by identifying the optimal culture medium that simultaneously improves multiple desired parameters for the studied parameters [18,73].In the current research, the ESR was linked to the NSGA-II algorithm as a computational forecasting approach for predicting and identifying critical factors affecting the in vitro proliferation stage of pomegranate cultivars.The successful application of optimization algorithms, especially NSGA-II, in the field of plant tissue culture has already been accomplished [31].Additionally, various ML algorithms based on different optimization algorithms have shown promising results in modeling and predicting optimal plant tissue culture media for other fruit tree species such as kiwi berry [18], pear [74], prunus [15], pistachio rootstocks [74], and Persian walnut [10].The outcomes obtained through the ESR-NSGA-II method accurately predicted that the highest plant growth responses would be achieved by supplementing the culture medium with 0.750 mg/L ZT, and 0.500 mg/L GA 3 for the ' Atabaki' cultivar, 0.654 mg/L ZT, and 0.329 mg/L GA 3 for the 'Faroogh' cultivar, and 0.705 mg/L ZT, and 0.347 mg/L GA 3 for the 'Shirineshahvar' cultivar.Overall, the ESR-NSGA-II algorithm revealed that the interaction between genotype and different concentrations of PGRs caused the most significant influence on pomegranate shoot proliferation.These findings are consistent with a study by Sadat-Hoseini et al. [10], which employed ML approaches to model growth parameters of in vitro Persian walnut using different concentrations of BAP, tidiazuran (TDZ), and indole butyric acid (IBA), and reported that the genotype-PGR interaction plays a crucial role in the proliferation of Persian walnut.
To the best of the author's knowledge, this study represents the first investigation examining the specific effects of ZT and GA 3 , as well as their interactions, in enhancing the efficiency of pomegranate tissue culture protocol, especially with the studied pomegranate cultivars on in vitro conditions for enhancing growth parameters.While previous studies have reported in vitro shoot proliferation success of different pomegranate cultivars, the focus on the specific combination of ZT and GA 3 , and their interactions effects, is a novel aspect of this research.By evaluating the influence of these growth regulators on growth parameters, this study contributes to the advancement of pomegranate tissue culture techniques.

Conclusion
In vitro shoot proliferation is a multifactorial and complex process influenced by various interacting factors.So, to evaluate the extensive datasets and optimize the pomegranate protocol, ML techniques such as RF, SVR, XGB, ESR, and ENMLR were employed as promising alternatives to traditional statistical methods.Based on our results, ESR-NSGA-II exhibited superior

Fig. 3
Fig. 3 The schematic diagram of the step-by-step procedure of the present research includes (A) pomegranate micropropagation, B modeling growth parameters based on K-fold cross-validation and ATPE algorithm using MLs, and (C) optimization process of growth parameters via non-dominated sorting genetic algorithm-II (NSGA-II) and b represents the bias.The values of ω and b are determined by mini- mizing the regularized risk function, which is expressed as: where C represents the penalty parameter that balances the trade-off between model complexity and training error, d i denotes the desired value, n represents the total number of observations, and C 1 n n i=1 L d i , y i is the empirical error.The following equation is employed to determine the insensitive loss function ( l ε ):

Fig. 5
Fig. 5 Comparison between the predicted compressive strength via RF, XGB, SVR, ESR, and ENMLR models.A leave number, B proliferation, C explant survival, D shoot length of the pomegranate cultivar 'Atabaki'

Fig. 6 Fig. 7
Fig. 6 Comparison between the predicted compressive strength via RF, XGB, SVR, ESR, and ENMLR models.A leave number, B proliferation, C explant survival, D shoot length of the pomegranate cultivar 'Faroogh'

Table 1
Hyperparameter tuning of the constructed models using ATPE SVR Support Vector Regression, RF Random Forest, XGB Extreme Gradient Boosting, ENMLR Elastic Net Multivariate Linear Regression, ESR Ensemble Stacking Regression i ) represents the learner at step d , the predic- tions at steps d and d − 1 are denoted as f 2approaches 1, the model's ability to account for the variability in the data improves excellent for MAP E < 10% good for 10% < MAPE < 20% acceptable for 20% < MAPE < 50% inaccurate for MAP E > 50%

Table 3
Effect of different concentrations of PGRs on in vitro growth parameters of pomegranate cultivars

Table 3
(continued)The results were expressed as the mean ± standard deviation (n = 20) GA 3 gibberellic acid, ZT zeatin, PR proliferation rate, SL shoot length, LN leave number, and ES explant survival Bold values have mentioned the biggest and best values, respectively.

Table 4
Statistical evaluation of the constructed models for the micropropagation of the pomegranate cultivar 'Atabaki'

Table 4 (
continued) SVR Support Vector Regression, RF Random Forest, XGB Extreme Gradient Boosting, ENMLR Elastic Net Multivariate Linear Regression, ESR Ensemble Stacking Regression, R coefficient of determination, RRMSE Relative Root Mean Square Error, RMSE Root Mean Square Error, MAPE Mean Absolute Percentage Error, PR proliferation rate, SL shoot length, LN leave number, and ES explant survival

Table 5
Statistical evaluation of the constructed models for the micropropagation of the pomegranate cultivar 'Faroogh' SVR Support Vector Regression, RF Random Forest, XGB Extreme Gradient Boosting, ENMLR Elastic Net Multivariate Linear Regression, ESR Ensemble Stacking Regression, R coefficient of determination, RRMSE Relative Root Mean Square Error, RMSE Root Mean Square Error, MAPE Mean Absolute Percentage Error, PR proliferation rate, SL shoot length, LN leave number, and ES explant survival

Table 6
Statistical evaluation of the constructed models for the micropropagation of the pomegranate cultivar 'Shirineshahvar' SVR Support Vector Regression, RF Random Forest, XGB Extreme Gradient Boosting, ENMLR Elastic Net Multivariate Linear Regression, ESR Ensemble Stacking Regression, R coefficient of determination, RRMSE Relative Root Mean Square Error, RMSE Root Mean Square Error, MAPE Mean Absolute Percentage Error, PR proliferation rate, SL shoot length, LN leave number, and ES explant survival

Table 7
Ranking of the best-performing ML models for growth parameters of pomegranate SVR: Support Vector Regression, RF: Random Forest, XGB: Extreme Gradient Boosting, ENMLR: Elastic Net Multivariate Linear Regression, ESR: Ensemble Stacking Regression, GPI: Global Performance Indicator, PR: proliferation rate, SL: shoot length, LN: leave number, and ES: explant survival Bold values have mentioned the biggest and best values, respectively.

Table 8
Optimization of pomegranate cultivars and different concentrations of ZT, and GA 3 according to the ESR-NSGA-II algorithm to obtain the best plant growth parameters GA 3 gibberellic acid, ZT zeatin, PR proliferation rate, SL shoot length, LN leave number, and ES explant survival